# Multi-Scenario Applicability
Nano Image Captioning
Apache-2.0
This is a lightweight image captioning model based on bert-tiny and vit-tiny, weighing only 40MB, with extremely fast inference speed on CPU.
Image-to-Text
Transformers English

N
cnmoro
184
3
Vitpose Plus Huge
Apache-2.0
ViTPose++ is a vision Transformer-based foundational model for human pose estimation, achieving an outstanding performance of 81.1 AP on the MS COCO keypoint test set.
Pose Estimation
Transformers

V
usyd-community
14.49k
6
Distilvit
Apache-2.0
A vision-language model based on VIT image encoder and distilled GPT-2 text decoder for image caption generation tasks
Image-to-Text
Transformers

D
Mozilla
290
19
T5 Base Spellchecker
A spell checker built on the T5-Base transformer for detecting and correcting text spelling errors.
Large Language Model
Transformers

T
Bhuvana
95
13
Featured Recommended AI Models